Haiti
Country:
- Asia > China (0.65)
- Asia > Middle East > Iran (0.48)
- North America > United States > Illinois (0.05)
- (7 more...)
Industry:
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military > Navy (1.00)
Country:
- Asia > Middle East > Israel (0.66)
- North America > United States > New York (0.30)
- North America > Haiti (0.15)
- (11 more...)
Industry:
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Technology:
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.85)
Country:
- North America > United States (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Peru (0.04)
- (34 more...)
Genre:
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
Industry:
Technology:
Country:
- North America > United States (0.14)
- North America > Haiti (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- (11 more...)
Genre:
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Industry:
- Leisure & Entertainment > Sports > Soccer (1.00)
- Education (1.00)
- Media > Music (0.69)
- Information Technology (0.68)
Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
- Information Technology > Communications > Social Media (0.93)
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
Country:
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Country:
- North America > United States > Maryland (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California (0.04)
- (3 more...)
Technology:
Country:
- South America > Brazil (0.04)
- North America > United States (0.04)
- Asia > India (0.04)
- (47 more...)
Country:
- South America > Brazil (0.04)
- North America > United States (0.04)
- Asia > Indonesia (0.04)
- (47 more...)
Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Country:
- North America > Canada (0.14)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (31 more...)
Industry:
Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.68)
Country:
- Asia > Russia (1.00)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.41)
- Asia > Middle East > Palestine > Gaza Strip > Rafah Governorate > Rafah (0.29)
- (23 more...)
Industry:
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Regional Government > Europe Government (0.70)
Technology:
- Information Technology > Communications > Social Media (0.74)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.48)